Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration

نویسندگان

Takeaki Uno

Hiroki Arimura

چکیده

Mining frequently appearing patterns in a database is a basic problem in recent informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, called transaction, the problem is called the frequent itemset mining problem, and it has been extensively studied. The items in a frequent itemset appear in many records simultaneously, thus they can be considered to be a cluster with respect to these records. However, in this sense, the condition that every item appears in each record is quite strong. We should allow for several missing items in these records. In this paper, we approach this problem from the algorithm theory, and consider the model that can be solved efficiently and possibly valuable in practice. We introduce ambiguous frequent itemsets which allow missing items in their occurrence records. More precisely, for given thresholds θ and σ, an ambiguous frequent itemset P has a transaction set T , |T | ≥ σ, such that on average, transactions in T include ratio θ of items of P . We formulate the problem of enumerating ambiguous frequent itemsets, and propose an efficient polynomial delay polynomial space algorithm. The practical performance is evaluated by computational experiments. Our algorithm can be naturally extended to the weighted version of the problem. The weighted version is a natural extension of the ordinary frequent itemset to weighted transaction databases, and is equivalent to finding submatrices with large average weights in their cells. An implementation is available at the author’s homepage

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining

Mining frequently appearing patterns in a database is a basic problem in informatics, especially in data mining. Particularly, when the input database is a collection of subsets of an itemset, the problem is called the frequent itemset mining problem, and has been extensively studied. In the real-world use, one of difficulties of frequent itemset mining is that real-world data is often incorrec...

متن کامل

A Closed Frequent Subgraph Mining Algorithm in Unique Edge Label Graphs

Problems such as closed frequent subset mining, itemset mining, and connected tree mining can be solved in a polynomial delay. However, the problem of mining closed frequent connected subgraphs is a problem that requires an exponential time. In this paper, we present ECE-CloseSG, an algorithm for finding closed frequent unique edge label subgraphs. ECE-CloseSG uses a search space pruning and ap...

متن کامل

On SAT Models Enumeration in Itemset Mining

Frequent itemset mining is an essential part of data analysis and data mining. Recent works propose interesting SAT-based encodings for the problem of discovering frequent itemsets. Our aim in this work is to define strategies for adapting SAT solvers to such encodings in order to improve models enumeration. In this context, we deeply study the effects of restart, branching heuristics and claus...

متن کامل

Efficient Maximal Frequent Itemset Mining by Pattern - Aware Dynamic Scheduling

While frequent pattern mining is fundamental for many data mining tasks, mining maximal frequent itemsets efficiently is important in both theory and applications of frequent itemset mining. The fundamental challenge is how to search a large space of item combinations. Most of the existing methods search an enumeration tree of item combinations in a depthfirst manner. In this thesis, we develop...

متن کامل

Polynomial-Delay Enumeration of Monotonic Graph Classes

Algorithms that list graphs such that no two listed graphs are isomorphic, are important building blocks of systems for mining and learning in graphs. Algorithms are already known that solve this problem efficiently for many classes of graphs of restricted topology, such as trees. In this article we introduce the concept of a dense augmentation schema, and introduce an algorithm that can be use...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Ambiguous Frequent Itemset Mining and Polynomial Delay Enumeration

نویسندگان

چکیده

منابع مشابه

An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining

A Closed Frequent Subgraph Mining Algorithm in Unique Edge Label Graphs

On SAT Models Enumeration in Itemset Mining

Efficient Maximal Frequent Itemset Mining by Pattern - Aware Dynamic Scheduling

Polynomial-Delay Enumeration of Monotonic Graph Classes

عنوان ژورنال:

اشتراک گذاری